27 research outputs found

    Identifying Redundancies in Fork-based Development

    Get PDF

    A Dataset and Analysis of Open-Source Machine Learning Products

    Full text link
    Machine learning (ML) components are increasingly incorporated into software products, yet developers face challenges in transitioning from ML prototypes to products. Academic researchers struggle to propose solutions to these challenges and evaluate interventions because they often do not have access to close-sourced ML products from industry. In this study, we define and identify open-source ML products, curating a dataset of 262 repositories from GitHub, to facilitate further research and education. As a start, we explore six broad research questions related to different development activities and report 21 findings from a sample of 30 ML products from the dataset. Our findings reveal a variety of development practices and architectural decisions surrounding different types and uses of ML models that offer ample opportunities for future research innovations. We also find very little evidence of industry best practices such as model testing and pipeline automation within the open-source ML products, which leaves room for further investigation to understand its potential impact on the development and eventual end-user experience for the products

    An Exploratory Study to Find Motives Behind Cross-platform Forks from Software Heritage Dataset

    Full text link
    The fork-based development mechanism provides the flexibility and the unified processes for software teams to collaborate easily in a distributed setting without too much coordination overhead.Currently, multiple social coding platforms support fork-based development, such as GitHub, GitLab, and Bitbucket. Although these different platforms virtually share the same features, they have different emphasis. As GitHub is the most popular platform and the corresponding data is publicly available, most of the current studies are focusing on GitHub hosted projects. However, we observed anecdote evidences that people are confused about choosing among these platforms, and some projects are migrating from one platform to another, and the reasons behind these activities remain unknown.With the advances of Software Heritage Graph Dataset (SWHGD),we have the opportunity to investigate the forking activities across platforms. In this paper, we conduct an exploratory study on 10popular open-source projects to identify cross-platform forks and investigate the motivation behind. Preliminary result shows that cross-platform forks do exist. For the 10 subject systems in this study, we found 81,357 forks in total among which 179 forks are on GitLab. Based on our qualitative analysis, we found that most of the cross-platform forks that we identified are mirrors of the repositories on another platform, but we still find cases that were created due to preference of using certain functionalities (e.g. Continuous Integration (CI)) supported by different platforms. This study lays the foundation of future research directions, such as understanding the differences between platforms and supporting cross-platform collaboration.Comment: Accepted at 17th International Conference on Mining Software Repositories, October 5--6, 2020, Seoul, Republic of Kore

    The causal relationship between COVID-19 and seventeen common digestive diseases: a two-sample, multivariable Mendelian randomization study

    No full text
    Abstract Objectives In clinical practice, digestive symptoms such as nausea, vomiting are frequently observed in COVID-19 patients. However, the causal relationship between COVID-19 and digestive diseases remains unclear. Methods We extracted single nucleotide polymorphisms associated with the severity of COVID-19 from summary data of genome-wide association studies. Summary statistics of common digestive diseases were primarily obtained from the UK Biobank study and the FinnGen study. Two-sample Mendelian randomization analyses were then conducted using the inverse variance-weighted (IVW), Mendelian randomization-Egger regression (MR Egger), weighted median estimation, weighted mode, and simple mode methods. IVW served as the primary analysis method, and Multivariable Mendelian randomization analysis was employed to explore the mediating effect of body mass index (BMI) and type 2 diabetes. Results MR analysis showed that a causal association between SARS-CoV-2 infection (OR = 1.09, 95% CI 1.01–1.18, P = 0.03), severe COVID-19 (OR = 1.02, 95% CI 1.00–1.04, P = 0.02), and COVID-19 hospitalization (OR = 1.04, 95% CI 1.01–1.06, P = 0.01) with gastroesophageal reflux disease (GERD). Mediation analysis indicated that body mass index (BMI) served as the primary mediating variable in the causal relationship between SARS-CoV-2 infection and GERD, with BMI mediating 36% (95% CI 20–53%) of the effect. Conclusions We found a causal relationship between SARS-CoV-2 infection and gastroesophageal reflux disease. Furthermore, we found that the causal relationship between SARS-CoV-2 infection and GERD is mainly mediated by BMI

    Extracting Configuration Knowledge from Build Files with Symbolic Analysis

    No full text

    LncRNA BCAR4, targeting to miR-665/STAT3 signaling, maintains cancer stem cells stemness and promotes tumorigenicity in colorectal cancer

    No full text
    Abstract Background Breast cancer anti-estrogen resistance 4 (BCAR4) is closely associated with colorectal cancer (CRC) initiation and propagation. However, the mechanisms underlying BCAR4 function in colon cancer remains largely unknown. In this study, we hypothesized that BCAR4 could regulate colon cancer stem/initiating cells (CSC) function and further facilitates the colon cancer progression. Methods qRT-PCR was used to examine the expression of BCAR4 and various CSC markers. FACS, acetaldehyde dehydrogenase (ALDH) activity and western blot assays were applicable to test the expression of CSC markers. CCK8, tumorsphere formation and transwell assays were adopted to examine the capacity of CRC cells proliferation, self-renewal and migration. Pull down assay was used to test the interaction between BCAR4 and miR-665. Luciferase reporter assay was used to examine the interaction of miR-665 and activators of transcription (STAT3). In vivo tumor xenograft study was used to verify the malignancy of CRC cells with inhibition of BCAR4. Results Breast cancer anti-estrogen resistance 4 was highly expressed in both CRC cells and stem/initiating cells. In addition, overexpression of BCAR4 facilitated the maintenance of ALDH positive cells (a type of cancer stem/initiating cells) stemness and promoted ALDH+ cells proliferation and migration. Inhibition of BCAR4 restricted ALDH+ cells proliferation and migration. We further proved that miR-665 was the target of BCAR4 and subsequently activated signal transducers and STAT3 signaling which is an important pathway in cancer stem cells self-renewal. Conclusions Breast cancer anti-estrogen resistance 4 promotes the CRC cells stemness through targeting to miR-665/STAT3 signaling and identification of the BCAR4 in CRC stem cells provides a new insight into CRC diagnosis, treatment, prognosis and next-step translational investigations

    Pancancer analysis uncovers an immunological role and prognostic value of the m6A reader IGF2BP2 in pancreatic cancer

    No full text
    Introduction: Pancreatic ductal adenocarcinoma (PDAC) is one of the most malignant gastrointestinal tumors worldwide with a dismal prognosis and high relapse rate. PDAC is considered a “cold cancer” for which immunotherapy is not effective. Therefore, to improve the prognosis for PDAC patients, it is urgent to explore the mechanism driving its insensitivity to immunotherapy. Materials and methods: We conducted pancancer analyses to test IGF2BP family expression and survival in patients with different cancers via TCGA and GETx databases. Then, we determined the immunological role and prognostic value of IGF2BP2 in vitro, in vivo and in clinical specimens. Results: In the present study, we found that the m6A reader IGF2BP2 was the most clinically relevant member of the IGF2BP family for pancreatic cancer. High expression of IGF2BP2 was most associated with poor prognosis and an immunosuppressive microenvironment in PDAC. By IGF2BP2 knockdown, we found that tumor cell proliferation and invasive ability were significantly diminished. Importantly, we found that IGF2BP2 expression was closely associated with high expression of immunosuppressive molecules such as PD-L1. IGF2BP2 modulated downstream PD-L1 expression by regulating its mRNA stability via m6A methylation control, and we obtained the same verification in animal experiments and human tissue specimens. Conclusion: Our study contributes to existing knowledge regarding the IGF2BP2-regulated PD-L1 signaling pathway as a potential prognostic and immune biomarker in pancreatic cancer

    Extracting Configuration Knowledge from Build Files with Symbolic Analysis

    No full text
    <p>Build systems contain a lot of configuration knowledge about a software system, such as under which conditions specific files are compiled. Extracting such configuration knowledge is important for many tools analyzing highly-configurable systems, but very challenging due to the complex nature of build systems. We design an approach, based on SYMake, that symbolically evaluates Make files and extracts configuration knowledge in terms of file presence conditions and conditional parameters. We implement an initial prototype and demonstrate feasibility on small examples.</p
    corecore